Pharmacoepidemiology and Drug Safety
○ Wiley
Preprints posted in the last 30 days, ranked by how well they match Pharmacoepidemiology and Drug Safety's content profile, based on 12 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.
Riera-Arnau, J.; Paoletti, O.; Gini, R.; Thurin, N. H.; Souverein, P. C.; Abtahi, S.; Duran, C. E.; Pajouheshnia, R.; Roberto, G.
Show abstract
BackgroundIn pharmacoepidemiological studies, days of treatment (DoT) duration associated with individual electronic drug utilization records (DUR) are usually missing. Researcher-defined duration (RDD) calculation approaches, as opposed to data-driven approaches, can be used to estimate DoT based on the specific choices and assumptions made by investigators. These are usually underreported or even undocumented. We aimed to develop a framework for the standardization of terminology, formulas, implementation, and reporting of possible RDD approaches. MethodsA systematic classification of RDD calculation approaches was developed via expert consensus. Universal concepts used to operationalise RDDs were identified and described using standard terminologies. An open-source R function, CreateDoT, was created to implement the formulas universal concepts as input parameter. A step-by-step workflow was developed to facilitate implementation and reporting. ResultsRDD approaches were classified in two main classes: I) daily dose (DD)-based calculation approaches (n=3 formulas), and II) fixed-duration approaches (n=2). Seven universal concepts were identified to describe the five corresponding generalized formulas for DoT calculation. Input parameters of the CreateDoT function can be retrieved from source data through its mapping to universal concepts, or inputted by the investigator based on the chosen calculation approach. The input file structure itself represents a standard reporting template for documenting investigators assumptions and methodological choices adopted for DoT calculation. ConclusionsThe CreateDoT framework can facilitate the documentation and reporting of RDD approaches for DoT calculation, increasing transparency and reproducibility of pharmacoepidemiological studies regardless of the data model used, and facilitates sensitivity analyses to evaluate the impact of alternative assumptions in DoT calculation.
Heckmann, N. S.; Papoutsi, D. G.; Barbieri, M. A.; Battini, V.; Molgaard, S. N.; Schmidt, S. O.; Melskens, L.; Sessa, M.
Show abstract
BackgroundBiomedical Large Language Models (LLMs) combined with prompt engineering offer domain-specific reasoning, yet their application to individual-level causality assessment remains unexplored. This study evaluated five combinations of biomedical LLMs, prompting strategies, and causality algorithms by comparing their agreement with two human expert evaluators. Research design and methodsA total of 150 Individual Case Safety Reports (ICSRs) were analyzed: 140 reports from Food and Drug Administration Adverse Event Reporting System (FAERS), and 10 myocarditis/pericarditis ICSRs from Vaccine AERS (VAERS). Assessments were conducted using the Naranjo and WHO-UMC algorithms. Biomedical LLMs tested included TinyLlama 1.1B, Medicine LLaMA-3 8B, and MedLLaMA v20, combined with Chain-of-Thought (CoT) or Decomposition prompting. Agreement was measured using Gwets Agreement Coefficient 1 (AC1) and percentage agreement, alongside performance metrics and qualitative error analysis. ResultsThe Medicine LLaMA-3 8B-Naranjo-CoT combination achieved the highest agreement with human assessors for the final classification of causality (64%). Biomedical LLMs demonstrated low inter-rater agreement on critical items of causality assessment such as identification of listed AE, temporal plausibility, alternative causes, and objective evidence of AEs. Frequent model failures included irrelevant responses. ConclusionsBiomedical LLMs showed improved performance over general purpose models previously tested but remain suboptimal for reliable causality assessment of ICSRs.
Hedfords Vidlin, S.; Giunchi, V.; K-Papai, L.; Sandberg, L.; Zaccaria, C.; Sakai, T.; Piccolo, L.; Rocca, E.; Fusaroli, M.; Trinh, N. T.
Show abstract
BackgroundPost-marketing surveillance is essential for complementing the safety profiles of medicinal products, especially for populations generally excluded from clinical trials such as pregnant individuals. However, the absence of a standardised pregnancy indicator in the electronic transmissions of adverse event reports hampers their correct identification in pharmacovigilance databases and complicates the study of safety concerns related to pregnancy exposures. Three recently developed rule-based algorithms with the common aim to systematically retrieve pregnancy-related reports differ in scope and are tailored to different databases (A. FAERS, B. EudraVigilance, C. VigiBase). AimTo compare the design and outputs of the three pregnancy algorithms. MethodsThis study was a collaboration among the authors of the three pregnancy algorithms. We harmonised their rules, implemented them in an R package to enable execution in both VigiBase and FAERS, and analysed key characteristics of reports flagged by each algorithm. ResultsThe pregnancy algorithms A, B, and C flagged 235653, 279515, and 446957 reports respectively in VigiBase, and 265015, 260734, 350479 in FAERS. Reports exclusively retrieved by each algorithm (994, 3248, and 142324 in VigiBase, and 1528, 1100, and 59643 in FAERS) were mostly explained by Algorithm A having no age restriction, Algorithm B excluding normal pregnancy and ineffective contraception, and Algorithm C excluding paternal exposure. ConclusionsDifferences in flagging were largely related to varying scopes. Understanding commonalities and differences is crucial for empowering professionals working with pregnancy-related pharmacovigilance to select and use the most appropriate algorithm for their specific needs. Key pointsO_LIThree independently developed algorithms were designed to retrieve pregnancy-related adverse event reports and support research into pregnancy-specific safety concerns. C_LIO_LIBy applying these algorithms to VigiBase and FAERS, we highlighted overlaps and differences in the reports they flag, reflecting heterogeneous scope and implementation. C_LIO_LIAwareness of these distinctions is essential to select and apply the most suitable algorithm for their specific needs. C_LI
Levi, J.; Cross, S.; Ramesh, N.; Venter, F.; Hill, A.
Show abstract
ObjectivesTo estimate potential launch prices of generic semaglutide following patent expiry from 2026 and to quantify the global obesity and type 2 diabetes (T2DM) burden in countries where generic access may become possible. MethodsWe used World Bank population data and World Obesity and Diabetes Atlas prevalence estimates to calculate obesity and T2DM burden in countries where semaglutide patents expire in 2026 or were not filed. Patent status was identified using MedsPaL and cross-checked with regional databases. We updated established cost-plus pricing methodologies using 2024-2025 Indian API shipment data to estimate production costs for oral and injectable semaglutide, incorporating formulation, packaging, taxation, and profit assumptions. ResultsTen countries with 2026 patent expiry represent 44% of the global population and 48% of the global obesity burden. No patent filings were identified in 150 additional countries. By the end of 2026, generic injectable semaglutide could be distributed in 160 countries where 69% of global T2DM and 84% of clinical obesity occurs. Estimated generic injectable costs ranged from $28-140 per person-year, while oral formulations ranged from $186-380 per person-year. Injection devices contributed disproportionately to total cost. ConclusionPatent expiry could substantially expand access to semaglutide at dramatically lower prices, particularly in high-burden settings. However, device costs, secondary patents, and health system constraints may limit equitable uptake without coordinated policy action. Study ImportanceO_ST_ABSWhat is already known about this subject?C_ST_ABSO_LISemaglutide is highly effective for obesity and cardiometabolic disease but remains unaffordable in many low- and middle-income countries due to high branded prices and patent protections. C_LIO_LIPrevious cost-plus analyses show that generic competition can substantially reduce prices of essential medicines after patent expiry. C_LI What are the new findings in your manuscript?O_LIUsing 2024-2025 API shipment data, we estimate generic injectable semaglutide could be produced for $28-140 per person-year following 2026 patent expiry. C_LIO_LIBy 2026, generic semaglutide could be available in 160 countries comprising 69% of global T2DM and 84% of clinical obesity burden. C_LI How might your results change the direction of research or the focus of clinical practice?O_LIProvides an evidence base for procurement planning and price negotiations ahead of patent expiry. C_LIO_LIHighlights the importance of addressing device costs and secondary patents to ensure equitable global access. C_LI
Hassan, F.; Lou, J. Y.; Lim, C. T.; Ong, W. Q.; Rumaizi, N. N.
Show abstract
Artificial intelligence (AI), particularly large language models (LLMs), is increasingly explored in healthcare, yet its real-world usability and safety in high-risk clinical pharmacy tasks remain uncertain. Vancomycin therapeutic drug monitoring (TDM), which requires precise pharmacokinetic calculations and context-sensitive interpretation within a narrow therapeutic window, provides a stringent test case for AI-assisted decision support. This proof-of-concept study developed and evaluated a hybrid clinical decision support system (TDM-AID) integrating a validated deterministic pharmacokinetic calculation engine, GPT-4o-based structured clinical interpretation, and retrieval-augmented guideline support. Thirty retrospective adult vancomycin TDM cases were assessed using a weighted six-domain rubric covering pharmacokinetic accuracy, AUC estimation, prospective prediction, timing recommendations, clinical judgment, and documentation quality. Two independent expert pharmacists evaluated system outputs against benchmark consultations. The overall median performance was 78% (IQR 12%), classified as Acceptable, and 73% (IQR 14%) when deterministic calculations were excluded. Foundational pharmacokinetic calculations achieved 100% accuracy. Clinical judgment demonstrated Good performance (83%), whereas prospective prediction was limited (58%), and timing recommendations were absent in all cases. Safety violations occurred in 17% of cases, including dose recommendations exceeding 4 g/day. Inter-rater reliability was good (ICC 0.87). These findings suggest that hybrid AI-driven decision support is technically feasible and usable as a pharmacist-augmenting draft generator; however, limitations in predictive reasoning, timing logistics, and safety enforcement necessitate deterministic safeguards and mandatory expert oversight before clinical implementation.
Bu, F.; Wu, R.; Ostropolets, A.; Aminorroaya, A.; Chen, H. Y.; Chai, Y.; Dhingra, L. S.; Falconer, T.; Hsu, J. C.; Kim, C.; Lau, W. C.; Man, K. K.; Minty, E.; Morales, D. R.; Nishimura, A.; Thangraraj, P.; Van Zandt, M.; Yin, C.; Khera, R.; Hripcsak, G.; Suchard, M. A.
Show abstract
BackgroundGLP-1 receptor agonists (GLP-1RAs) and SGLT2 inhibitors (SGLT2Is) have established cardiovascular benefits for patients with type 2 diabetes mellitus (T2DM), with similar class-level effectiveness found in previous studies. However, real-world comparative effectiveness assessments of individual agents remain limited. ObjectivesTo compare the cardiovascular effectiveness of individual GLP-1RAs and SGLT2Is. MethodsWe conducted a multi-national, retrospective, new-user active-comparator cohort study using 10 US and non-US administrative claims and electronic health record databases. The study included 1,245,211 adults with T2DM receiving metformin who initiated second-line therapy with one of six GLP-1RAs (albiglutide, dulaglutide, exenatide, liraglutide, lixisenatide, semaglutide) or one of four SGLT2Is (canagliflozin, dapagliflozin, empagliflozin, ertugliflozin). Empagliflozin (393,499; 31.6%), semaglutide (235,585; 18.9%), dapagliflozin (208,666; 16.8%), and dulaglutide (207,348; 16.8%) were most commonly used. A secondary subgroup analysis included 316,242 patients with established cardiovascular diseases (CVD). Primary outcomes were 3-point major adverse cardiovascular events (MACE: acute myocardial infarction, stroke, sudden cardiac death) and 4-point MACE (adding hospitalization/ER visit with heart failure). Secondary outcomes included the individual components. Hazard ratios (HRs) were estimated for pairwise agent comparisons while on-treatment (per-protocol) and over total follow-up using Cox proportional hazards models, with propensity score adjustments, negative control calibration, and pre-specified study diagnostics to guard against potential confounding. Random-effects meta-analysis produced summary HR estimates across data sources that passed diagnostics. ResultsAcross the study cohort, individual GLP-1RAs and SGLT2Is demonstrated broadly similar cardiovascular effectiveness, both within and across drug classes. For example, semaglutide and empagliflozin showed comparable risks for 3-point MACE (meta-analytic HR 1.05; 95% CI 0.79-1.39) and 4-point MACE (meta-analytic HR 0.95; 95% CI 0.81-1.12), with consistent findings in the CVD subgroup. Study diagnostics confirmed adequate equipoise, covariate balance and statistical power to detect similarity in HRs between 0.8 and 1.2 for commonly used agents. ConclusionsIn this large-scale real-world study, individual GLP-1RAs and SGLT2Is exhibited largely comparable cardiovascular benefits, including in patients with established CVD. These findings align with network meta-analytic estimates from major cardiovascular outcome trials and broadly support current treatment guidelines. Clinical choices should be guided by relevant factors such as safety, adherence, tolerability, cost, and patient preference, where further work is needed.
Guo, W.; Wang, M.; Shin, J.; Li, F.; O'Brien, E. C.; Bortfeld, K.; Zhao, A.; Glover, L.; McDevitt, R.; Kalapura, C.; Wu, S.; Shibeika, S.; Aymes, S.; Porter, M.; Mac Grory, B.; Lusk, J. B.
Show abstract
Background and AimsThe glucagon-like peptide-1 receptor agonist (GLP-1 RA) semaglutide has demonstrated efficacy for the secondary prevention of cardiovascular disease among patients with overweight/obesity without diabetes mellitus. However, the comparative effectiveness of GLP-1 RA versus other antiobesity medications (e.g. phentermine-topiramate) not been evaluated. MethodsThis was a retrospective, observational, cohort study using target trial emulation methodology using the Truveta electronic health record database of more than 120 million patients. Adult patients with a body mass index (BMI) >=27 kg/m2, a history of cardiovascular disease (prior ischemic stroke, transient ischemic attack, or myocardial infarction, or known coronary artery disease, heart failure, or peripheral artery disease) without diabetes mellitus were included in the study. The primary endpoint was time to first major adverse cardiovascular or cerebrovascular event (MACCE, defined as stroke or myocardial infarction). ResultsIn total, 35,240 were included in the bupropion-naltrexone versus GLP-1 RA comparison, and 27,051 were included in the phentermine-topiramate versus GLP-1 RA comparison. In the pre-weighting cohort, GLP-1 RA use was associated with decreased hazard of MACCE compared to bupropion-naltrexone (HR 0.50 [95% confidence interval (CI) 0.36-0.69]) and phentermine-topiramate (HR 0.43 [95% CI 0.30-0.60]). In the propensity score-overlap weighted cohort, GLP-1 RA prescription was not associated with a lower hazard of MACCE than bupropion-naltrexone (aHR 0.69 [95% CI 0.47-1.00]) but was associated with a lower hazard compared to phentermine-topiramate (aHR 0.61 [95% CI 0.41-0.91]; adjusted absolute rate difference 0.98 per 1000 person-years). ConclusionsPrescription of a GLP-1 RA was associated with a lower risk of subsequent MACCE than phentermine-topiramate.
Pradhan, A. M.; Shetty, V. A.; Gregor, C.; Graham, J. H.; Tusing, L.; Hirsch, A. G.; Hall, E.; Troiani, V.; Davis, M. P.; Bieler, D. L.; Romagnoli, K. M.; Kraus, C. K.; Piper, B. J.; Wright, E. A.
Show abstract
IntroductionRecreational and medical cannabis use (CU) information is often available within the electronic health record (EHR) in a format that is impractical for health care provider use. Transformation of free-text EHR documentation in notes to discrete elements is possible using natural language processing (NLP) and has the potential to characterize CU efficiently. The objective of this study was to develop an NLP algorithm to identify documentation of CU within EHR unstructured clinical notes. MethodsWe identified EHR notes with cannabis-related terminologies through a keyword search among all Geisinger patients with at least one encounter between 1/1/2013 and 6/30/2022. We trained four NLP models to classify notes into six categories based on time, context, and reliability of CU documentation identified through manual annotation. We compared the demographic characteristics of patients with positive classification for CU using the best-performing model to those of the overall population. ResultsOf the over 1.7 million eligible patients, 150,726 (8.6%) were flagged as cannabis users. The Bio-ClinicalBERT, a transformer-based NLP model, achieved close to human performance in classifying CU (weighted Precision=91.4, Recall=93.3, F-score=92.4). Cannabis users had higher BMI and were at least nine-fold more likely to use tobacco, alcohol, and illicit substances. ConclusionOur study evaluated the prevalence of CU documentation across the entire corpus of EHR notes data without population segmentation. The NLP methodologies used achieved performance close to that of human annotation and laid the foundation for identifying and classifying CU within unstructured data sources, with future applications in research and patient care. Plain Language SummaryMarijuana, also known as cannabis, may impact the health of patients, yet it is not routinely captured in medical records, and when documented, it is often found in unstructured formats (e.g., progress notes) rather than in discrete fields. Incomplete and unstructured capture limits many functional capabilities within the EHR that enhance patient care (e.g., drug interactions, notifications) and limit researchers from identifying patients routinely exposed to marijuana use. The transformation of free-text documentation of cannabis use (CU) into discrete elements can be performed using natural language processing (NLP). The objective of this study was to develop an NLP model to identify CU in unstructured clinical notes in the EHR. We examined the EHRs of Geisinger patients in Pennsylvania over a 10-year period. Among 1.7 million patients, 9% were identified as CU. One of the NLP models tested, Bio-ClinicalBERT, achieved the highest performance. Cannabis users had a higher BMI and were ten-fold more likely to be tobacco users, ten-fold more likely to use alcohol, and nine-fold more likely to use illicit substances. NLP can be used to better understand the risks and benefits of CU at a population level and may improve patient identification to assist clinical decision-making. Future CU epidemiological research should continue to explore other avenues to automate and improve CU documentation by leveraging rapidly evolving technologies, such as artificial intelligence-driven tools.
Pfaffenlehner, M.; Dressing, A.; Knoerzer, D.; Wagner, M.; Heuschmann, P.; Scherag, A.; Binder, H.; Binder, N.
Show abstract
BackgroundRoutinely collected health data are increasingly used to generate real-world evidence for therapeutic decision-making. Yet, stakeholders, including clinicians, pharmaceutical industry representatives, patient advocacy groups, and statisticians, prioritize different aspects of data quality, analysis, and interpretation. Without explicit consideration of these perspectives, analyses risk being fragmented, misaligned with end-user needs, or lacking transparency. MethodsWe developed a stakeholder-inclusive conceptual framework for modeling routine health data, informed by an interdisciplinary workshop and supported by targeted literature examples. The framework maps stakeholder priorities to methodological requirements and identifies analytical strategies that enable integration of diverse perspectives. ResultsClinicians prioritize interpretability and clinical relevance; the pharmaceutical industry emphasizes regulatory compliance and real-world evidence generation; patient groups highlight transparency, inclusion of patient-reported outcomes, and privacy protection; and statisticians focus on bias control and methodological rigor. Our framework illustrates how these priorities can be explicitly incorporated into modeling strategies. Multistate models exemplify a methodological approach that operationalizes these requirements by capturing dynamic disease trajectories, integrating intermediate outcomes, and offering graphical interpretability. Beyond specific methodological choices, clinical research relies fundamentally on statistical expertise. Depending on the research goal, statisticians roles can range from providing statistical consultations for standard analyses to applying or adapting advanced methods for more complex analyses to developing new methods for research questions that require novel approaches due to their specific characteristics. ConclusionsThe stakeholder-inclusive framework provides methodological guidance for designing analyses of routine health data that are clinically meaningful, scientifically rigorous, and socially acceptable. By aligning the research question with the intended perspective from the beginning, it supports more robust and transparent evidence generation, with multistate models serving as a flexible tool to operationalize this integration.
Kadinde, A.; Sangeda, R. Z.; Masatu, F. C.; Mwalwisi, Y. H.; Nkilingi, E. A.; Fimbo, A. M.
Show abstract
Background Antibiotic pricing is a key determinant of access and stewardship in low- and middle-income countries (LMICs), yet empirical evidence on how prices are formed within pharmaceutical markets remains limited. However, there is little longitudinal evidence on how antibiotic prices behave within national pharmaceutical supply systems. This study evaluated the patterns and determinants of systemic antibiotic pricing in Tanzania using national regulatory import permit data. Methods We conducted a retrospective analysis of antibiotic importation records from the Tanzania Medicines and Medical Devices Authority for 2010-2016. Systemic antibiotics for human use imported via oral or parenteral routes were included. Unit prices (USD per smallest unit of measure) were summarized using the median and interquartile range (IQR). Prices were compared by route of administration, supplier country, and product naming practice (INN-named versus brand-named) using Mann-Whitney U and Kruskal-Wallis tests with false discovery rate adjustment. Results Of the 14,301 records, 10,894 (76.2%) met the inclusion criteria. Oral antibiotics predominated (89.6%). Although the median oral antibiotic prices declined over time, substantial price dispersion persisted across all study years. Parenteral antibiotics were consistently more expensive (USD 0.755-3.370) and more variable than oral antibiotics. Importation was concentrated in a few medicines, with amoxicillin-clavulanate (16.7%) and amoxicillin (11.4%) accounting for over one-quarter of records, and in a few supplier countries, with India representing 44.9% of the records. Significant price differences between INN-named and branded products were observed for amoxicillin (adjusted p<0.001) and ciprofloxacin (adjusted p=0.018), whereas prices differed significantly by supplier country across major medicines (adjusted p<0.05). Across medicines and years, wide within-product price distributions indicate persistent market segmentation rather than price convergence. Conclusions Antibiotic import prices in Tanzania exhibit systematic and reproducible variations associated with formulation type, supplier origin, and product naming practices. The findings indicate that procurement structure and supplier participation strongly influence pricing in the import-dependent pharmaceutical market. Monitoring import-level prices can serve as an upstream indicator of market conditions and support evidence-informed procurement, pricing regulations, and antimicrobial stewardship policies in LMIC settings.
Kravos, A.; Dolenc, B.; Fartek, N.; Locatelli, I.; Cebron Lipovec, N.; Rogelj Meljo, N.; Kos, M.; Dobovsek, T.; Panter, G.
Show abstract
Iron deficiency (ID) is the most common nutritional deficiency worldwide, often caused by insufficient dietary intakes. Oral supplementation is one of the means to improve iron status. This study evaluated the efficacy and safety of two low-dose iron supplements - >Your< Iron Forte Capsules (YIFC) and Ferrous Sulfate Capsules (FSC) - in individuals with dietary ID. One hundred and one participants (mean age 30.6 years; 98% women) with low iron stores (mean serum ferritin 16.1 {micro}g/L) were randomized to receive either YIFC or FSC once daily for 12 weeks. Changes in blood indices and iron-related parameters were assessed at four and 12 weeks of intervention relative to baseline. The primary outcome was the change in hemoglobin (Hb) after 12 weeks. Eighty-seven participants completed the study. Both supplements significantly increased Hb at 12 weeks (YIFC: mean 6.52 g/L, p<0.001; FSC: mean 5.71 g/L, p<0.001). Product-related adverse events (AEs) were few (17% of all AEs) and of mild to moderate intensity only. One participant receiving FSC withdrew due to a probable product-related AE. The frequencies of product-related AEs were similar between study arms, however, statistically significantly more AEs judged to be definitely related to the product occurred in in the FSC arm. While product-related AEs were confined to the gastrointestinal tract in the YIFC arm, they affected multiple organ systems in the FSC arm. Supplementation with either YIFC or FSC proved as an effective, well-tolerated, and safe strategy for improving iron status in non-anemic dietary iron deficiency. In terms of the AE profile, supplementation with YIFC may offer advantages over supplementation with FSC.
Liu, C.; Mayer, M.; Lactaoen, K.; Gomez, L.; Weissman, G.; Hubbard, R.
Show abstract
Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-exchangeability between internal and external control patients. To address this challenge, we developed a sensitivity analysis framework to assess the robustness of HCT results to potential unmeasured confounding. We propose a tipping point analysis that adapts the E-value framework to the HCT setting where trial participation rather than treatment assignment is subject to confounding. To aid interpretation, we also introduce a data-driven benchmark representing the strength of unmeasured confounding reflected by the observed outcome non-exchangeability. We then propose an operational decision rule and evaluate its performance through simulation studies. Finally, we illustrate the approach using an asthma trial augmented by data from electronic health records. Simulation results demonstrate that our decision rule safeguards against Type I error inflation while preserving the power gains achieved by incorporating external data. In settings where moderate unmeasured confounding led to poorer outcomes for external controls, Type I error was controlled near the nominal 5% level, and power increased by 10-20% compared with analyses using RCT data alone. Our approach provides a practical, interpretable method to assess HCT robustness, supporting rigorous inference when integrating external real-world data.
Yuan, Y.; Peng, Z.; Doi, S. A. R.; Furuya-Kanamori, L.; Cao, H.; Lin, L.; Chu, H.; Loke, Y.; Mol, B. W.; Golder, S.; Vohra, S.; Xu, C.
Show abstract
BackgroundThe number of problematic randomized clinical trials (RCTs) has risen sharply in recent decades, posing serious challenges to the integrity of the healthcare evidence ecosystem. ObjectiveTo investigate whether retraction of problematic RCTs could reduce evidence contamination. DesignRetrospective cohort study SettingA secondary analysis of the VITALITY Study database. Participants1,330 retracted RCTs with 847 systematic reviews. MeasurementsThe difference in the median number (and its interquartile, IQR) of contamination before and after retraction. The association between time-to-retraction and likelihood of evidence contamination. ResultsAmong these retracted RCTs, 426 led to evidence contamination, resulting in 1,106 contamination events (251 after retraction vs. 855 before retraction). The time interval between RCT publication and first contamination ranged from 0.2 to 30.9 years, with a median of 3.3 years (95% CI: 3.0 to 3.9). The median number of contaminated systematic reviews was lower after retraction than before retraction (0, IQR: 0 to 1 vs. 1, IQR: 1 to 2, P < 0.01). Compared with trials retracted more than 7.5 years after publication, those retracted between 1.0 and 1.8 years (OR = 0.70, 95% CI: 0.60 to 0.80) and retracted within 1.0 year (OR = 0.69, 95% CI: 0.60 to 0.80) were associated with lower likelihood of evidence contamination. LimitationsOnly assessed contaminated systematic reviews with quantitative synthesis and limited to retracted RCTs. ConclusionsRetracting problematic RCTs can significantly reduce evidence contamination, and faster retraction was associated with less contamination. To safeguard the integrity of the evidence ecosystem, academic journals should act promptly in the retraction of problematic studies to minimize their downstream impact. Primary Funding SourcesThe National Natural Science Foundation of China (72204003, 72574229)
Zhang, L.; Higgins, I. A.; Dai, Q.; Gkatzionis, A.; Quistrebert, J.; Bashir, N.; Dharmalingam, G.; Bhatnagar, P.; Gill, D.; Liu, Y.; Burgess, S.
Show abstract
Mendelian randomization has emerged as a transformative approach for inferring causal relationships between risk factors and disease outcomes. However, applying Mendelian randomization to disease progression - a critical step in validating pharmacological targets - is hampered by index event bias. This form of selection bias occurs because analyses of disease progression are necessarily restricted to individuals who have already experienced the disease event. Here, we present a comprehensive evaluation of statistical methods designed to mitigate index event bias, including inverse-probability weighting, Slope-Hunter, and multivariable methods. We compare the performance of these methods in simulations and applied examples. Inverse-probability weighting methods reduce bias, but require individual-level data and will only fully eliminate bias when the disease event model is correctly specified. Slope-Hunter performed poorly in all simulation scenarios, even when its assumptions were fully satisfied. Multivariable methods worked best when including genetic variants that affect the incident disease event. However, if these genetic variants also affect disease progression directly, then the analysis will suffer from pleiotropy. Hence, if the same biological mechanisms affect disease incidence and progression, then multivariable methods will have little utility. But in such a case, analyses of disease progression are less critical, as conclusions reached from analyses of disease incidence are likely to hold for disease progression. Our findings indicate that no single method is a universal solution to provide reliable results for the investigation of disease progression. Instead, we propose a strategic framework for method selection based on data availability and biological context.
McIntyre, R. S.; Zhang-James, Y.; Goldberg, J. F.; Kwan, A. T.
Show abstract
GLP-1 receptor agonists (GLP-1 RAs) are effective in delaying progression of chronic kidney disease in individuals with type 2 diabetes mellitus (T2DM). We evaluated whether GLP-1 RA prescription is associated with reduced nephrotoxicity in adults receiving long-term lithium therapy. We conducted a retrospective, propensity score-matched cohort study using electronic health records from the TriNetX global network, which includes de-identified data from over 127 million patients across 109 healthcare organizations. The study population consisted of adults aged [≥]18 years with T2DM, with lithium exposure within the 2 years preceding the index date and at least one prescription for a GLP-1 RA. The primary efficacy outcome was the rate of renal nephrotoxicity in persons with T2DM prescribed lithium and a GLP-1 RA versus those with T2DM prescribed lithium but no GLP-1 RA or other antidiabetic agents. Nephrotoxicity was a composite of ICD-10 and CPT-coded renal disease. Incidence and time-to-event outcomes were assessed using Kaplan-Meier curves and Cox proportional hazards models. In our 24-month analysis, 462 matched patient pairs were included. Initiation of a GLP-1 RA during lithium therapy was associated with a lower incidence of renal events versus lithium alone (6{middle dot}1% vs 10{middle dot}4%), corresponding to a risk difference of -4.3% (95% CI -7{middle dot}86 to -0{middle dot}80), a risk ratio of 0{middle dot}58 (95% CI 0{middle dot}37-0{middle dot}91; p=0{middle dot}017), and higher event-free survival (89{middle dot}0% vs 83{middle dot}2%; log-rank p=0{middle dot}037). GLP-1 receptor agonist therapy was associated with a reduction in reports of lithium-associated nephrotoxicity. Our findings provide impetus to conduct mechanistic renal histopathologic studies combining GLP-1 RAs with lithium.
Sakata, N.; Tanaka, Y.; Naganuma, K.; Takahashi, Y.; Momose, S.; Higashi, M.; Tabayashi, T.
Show abstract
ObjectivesThe therapeutic efficacy of rituximab has reduced the discriminatory power of the International Prognostic Index (IPI) in diffuse large B-cell lymphoma (DLBCL), particularly within intermediate-risk categories. To address this "risk dilution," we aimed to develop and internally validate the AB-IPI (Albumin-BCL2 Refined Prognostic Index) using a hypothesis-driven approach that integrates tumor burden, host fitness, and tumor biology. MethodsThis multi-center retrospective study analyzed 289 patients with de novo DLBCL treated uniformly with R-CHOP immunochemotherapy. We combined the standard IPI with serum albumin < 3.6 g/dL (representing host fitness/rituximab pharmacokinetics) and BCL2 protein expression > 50% (representing tumor biology). The model was validated internally using bootstrapping with 1,000 resamples in accordance with TRIPOD Type 1b guidelines. This study adhered to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement for model development and internal validation (Type 1b). ResultsDuring the observation period, 115 death events were recorded. Multivariate Cox regression identified albumin < 3.6 g/dL (Hazard Ratio 2.62), IPI score > 2 (HR 2.13), and BCL2 > 50% (HR 1.72) as independent prognostic factors. The model maintained a robust Events Per Variable (EPV) ratio of 38.3. The AB-IPI stratified patients into four distinct risk groups with 5-year overall survival rates of 88.0% (Low), 76.1% (Intermediate-1), 45.0% (Intermediate-2), and 29.0% (High). The calibration plot demonstrated excellent agreement between predicted and observed probabilities, with a calibration slope of 0.98, indicating minimal optimism and robust risk estimation. Decision Curve Analysis (DCA) demonstrated that the AB-IPI provided a superior Net Benefit across a wide range of clinically relevant threshold probabilities. ConclusionsThe AB-IPI demonstrates superior clinical utility and calibration compared to the standard IPI. By identifying patients with compounded biological risks who are unlikely to be cured by R-CHOP alone, this score offers a practical framework for optimizing therapeutic strategies, such as the allocation of polatuzumab vedotin.
Wilson, F. A. A.; Garland, E. L.
Show abstract
OBJECTIVEOpioid misuse exacts a tremendous toll on society. Mindfulness-Oriented Recovery Enhancement (MORE) is an efficacious treatment for opioid misuse. Yet, the cost-effectiveness of this intervention remains unknown. METHODSCost-effectiveness and cost-benefit analyses of a randomized clinical trial with enrollment of 250 adults with chronic pain prescribed long-term opioid therapy who were misusing opioids. Participants were randomized to MORE (training in mindfulness, reappraisal, and savoring positive experiences) or supportive group psychotherapy across 8 weekly 2-hour groups. Incremental cost-effectiveness ratios (ICER) and benefit-to-cost ratios (BCRs) were computed using the primary outcome of opioid misuse at 9-month follow-up, as assessed by a composite measure based on self-report, clinical interview, and urine screen. RESULTS250 randomized patients (64.0% female) had an average age of 51.8 years (SD=11.9), were mostly taking oxycodone or hydrocodone (69%), and had mean morphine equivalent opioid dose of 101.0 (IQR=74) mg. At 9-mo. follow-up, the difference in the probability of having a positive Drug Misuse Index (DMI) rating was 0.24 (0.54 for MORE participants vs. 0.78 for controls). The ICER of MORE relative to supportive psychotherapy was $116.3 per averted case of opioid misuse, $8.9 per life-year, and $8.0 per quality-adjusted life-year. MORE is cost-saving vs. supportive psychotherapy after adjusting for healthcare costs. Excluding all benefits associated with averting fatal overdoses results in a BCR of 84.2. CONCLUSIONSGiven MOREs cost-effectiveness, private and public payers should consider disseminating this evidence-based therapy broadly across the nation to reduce mortality and morbidity associated with the ongoing opioid crisis. HIGHLIGHTSO_LIMindfulness-Oriented Recovery Enhancement (MORE) substantially reduced opioid misuse among adults with chronic pain on long-term opioid therapy. C_LIO_LIMORE was highly cost-effective vs. supportive psychotherapy, costing $116 per averted opioid misuse case, and MORE was cost saving when accounting for healthcare costs associated with opioid misuse. C_LIO_LIFindings suggest wide dissemination of this evidence-based treatment could yield major healthcare and other economic benefits in addressing the opioid crisis. C_LI
Rentsch, C. T.; Palzes, V.; Shi, M.; Setzer, M. R.; Malone, S. G.; Kline-Simon, A. H.; Piserchia, Z.; Winterland, E. L.; Leggio, L.; Lo Re, V.; Fiellin, D. A.; Tazare, J.; Farokhnia, M.; Sterling, S.; Kranzler, H. R.; Gray, J. C.
Show abstract
Alcohol use disorder (AUD) remains a major public health problem, with few effective medications and suboptimal adherence. L-type calcium channel blockers (LTCCBs) have genetic and preclinical support as potential treatments for AUD. We evaluated whether brain penetrant (BP)-LTCCBs are associated with reduced alcohol consumption by conducting two preregistered (https://osf.io/huawv) observational cohort studies using electronic health records (EHRs) from the US Department of Veterans Affairs (VA) and Kaiser Permanente Northern California (KPNC). New users of BP-LTCCBs (nifedipine or felodipine) were compared with new users of a non-BP-LTCCB (amlodipine) and with unexposed patients sampled from the same clinics, following a 180-day washout and requiring at least 60 days supply. Propensity score matching was conducted separately for BP-LTCCB versus unexposed, non-BP-LTCCB versus unexposed, and BP-versus non-BP-LTCCB. The primary outcome was change in drinks per week from the most recent pre-index screen to end of follow-up, estimated using difference-in-differences (DiD) models. Prespecified subgroup analyses were conducted by AUD diagnosis, baseline drinking level, and sex. Across both health systems, BP-LTCCB initiation was not associated with greater reductions in drinks per week than either comparator, with broadly consistent findings across all subgroups. In two large, preregistered EHR-based cohorts with rigorous confounding control, BP-LTCCBs were not associated with reduced drinking relative to comparators. Despite compelling genetic and preclinical evidence, these results do not support repurposing BP-LTCCBs for AUD, highlighting the need to prioritize alternative pharmacologic targets, potentially within etiologically informed subgroups.
Singh, M.; Larsson, D.; Zelano, J.
Show abstract
BackgroundPersons with epilepsy are at increased risk of depression/anxiety. Older antiseizure medications (ASMs) had drug-drug interactions that complicated pharmacotherapy of depression/anxiety; newer ASMs lack this drawback but can have psychiatric side effects. Anxiety/depression are increasingly recognized and treated pharmacologically. We hypothesized that the likelihood of treatment with selective serotonin uptake inhibitors (SSRI) would have increased in adult-onset epilepsy when prescription habits shifted towards newer ASMs. MethodsWe linked national health registers and included 28569 persons with epilepsy incident in 2006-2020 and 68509 age- and sex matched controls. We assessed the risk of starting SSRI treatment compared to age- and sex-matched controls across three incidence periods: 2006-2010, 2011-2015, and 2016-2020. Cox regression was used to estimate adjusted hazard ratios (HRs), and subgroup analyses explored age, sex, and comorbidities. Specialist psychiatric care was also assessed as a measure of more severe depression. Analysis including persons with SSRI-use before the epilepsy diagnosis were used for sensitivity analyses. FindingsPersons with epilepsy had higher risks of starting SSRIs compared to controls; 1986/9561 (20.8%) received SSRI during follow-up after epilepsy in 2006-2010 and 2020/9165 (22.0%) in 2016-2020; adjusted HRs were 1.92 (95%CI:1.79 - 2.06) in 2006-2010, 1.84 (95%CI:1.72-1.97) in 2011-2015, and 1.81 (95%CI:1.69 - 1.94) in 2016-2020. Among individuals aged 18-30 years at their epilepsy diagnosis, the proportion receiving SSRIs remained the same between the first and last calendar periods (18.2%). Because of increased treatment of controls, the adjusted HRs of SSRI-treatment decreased from 2.33, (95% CI:1.96 - 2.78) to 1.63, (95% CI 1.39 to 1.91). The HR of specialist psychiatric care was not significantly different between the time periods. Most comorbidities were consistently associated with increased likelihood of SSRI treatment, whereas intellectual disability decreased the likelihood in some periods. InterpretationWe found no evidence of overall increased SSRI initiation or psychiatric care after the shift to newer ASMs. Person with epilepsy remain more likely to receive SSRI treatment, but probably not to a level matching the higher prevalence of depression. Increased SSRI treatment of younger age adults has not been matched by increased treatment of young adults with epilepsy. This suggests a potentially widening treatment gap and a need for increased recognition of depression in young adults with epilepsy. FundingSwedish Research Council (2023-02816), Swedish state through the ALF-agreement (ALFGBG-1006343), Knut och Ragnvi Jacobsson foundation, Swedish Society for Medical Research (S18-0040), Swedish Society of medicine (SLS-881501), Epilepsifonden, Rune och Ulla Amlovs stiftelse.
Dymm, B.; Goldenholz, D. M.
Show abstract
ImportanceLarge language models (LLMs) offer potential decision support, but their accuracy varies. Prompt engineering can generally enhance LLM behavior in a clinical context, yet best practices have yet to be formally explored in realistic neurology settings. ObjectiveTo evaluate the impact of structured prompting versus simple prompting on the performance of six LLMs (three closed-source: OpenAI GPT-4o, OpenAI o3, OpenAI GPT-5.2 Thinking; three open-source: Meta Llama-4-Scout-17B-16E-Instruct, Llama-3.3-70B-Instruct-Turbo, and the reasoning model R1-1776) for thrombolytic clinical decision support (CDS) in acute stroke. DesignModels responded to three novel ischemic stroke vignettes using either a simple question ("Should this patient be offered thrombolytics?") or a five-step structured prompt (CARDS) guiding information extraction, timing analysis, contraindication checking, decision process explanation, and risk-benefit discussion. Outputs were assessed across seven domains: guideline adherence, unsafe recommendations, risk recognition, guideline grading accuracy, inclusion of conversational explanation, clarity, and overall helpfulness. ResultsStructured prompts significantly enhanced performance across most domains, with varying effects between model families. For some closed-source models (GPT-4o, o3), prompts structured in the CARDS style improved guideline adherence from 83.3% to 100%, eliminated unsafe recommendations (16.7% to 0%), and increased specific guideline grading accuracy from 0% to 100%. The closed-source reasoning model GPT-5.2 Thinking similarly achieved 100% adherence, 0% unsafe recommendations, and 100% grading accuracy with structured prompts, while also maintaining perfect safety and risk recognition under simple prompting. Similarly, the open-source reasoning model R1-1776 achieved these top-tier outcomes (100% adherence, 0% unsafe, 100% grading, 100% conversation) when structured prompts were applied, with grading and conversation improving from 0%. In contrast, other open-source models (Llama-4-Scout, Llama-3.3-70B) showed more modest gains: risk recognition improved (83.3% to 100%) and guideline grading accuracy increased (0% to 66.7%), while guideline adherence (66.7%) and unsafe recommendations (33.3%) persisted. Overall, structured prompting yielded the largest improvements in guideline grading accuracy and conversational reasoning across multiple models. ConclusionStructured prompting substantially enhances LLM performance for acute stroke thrombolysis CDS. Notably, some models, including the proprietary GPT-4o, o3, and GPT-5.2 Thinking, and the open-source reasoning model R1-1776, achieved excellent safety and adherence with structured prompts. For clinical deployment of any LLM, structured prompts are crucial, and vigilant human oversight remains essential.